An Optimization Scheme in MapReduce for Reduce Stage

نویسندگان

  • Qi Liu
  • Weidong Cai
  • Baowei Wang
  • Zhangjie Fu
  • Nigel Linge
چکیده

As a widely used programming model for the purposes of processing large data sets, MapReduce (MR) becomes inevitable in data clusters or grids, e.g. a Hadoop environment. Load balancing as a key factor affecting the performance of map resource distribution, has recently gained high concerns to optimize. Current MR processes in the realization of distributed tasks to clusters use hashing with random modulo operations, which can lead to uneven data distribution and inclined loads, thereby obstruct the performance of the entire distribution system. In this paper, a virtual partition consistent hashing (VPCH) algorithm is proposed for the reduce stage of MR processes, in order to achieve such a trade-off on job allocation. Besides, experienced programmers are needed to decide the number of reducers used during the reduce phase of the MR, which makes the quality of MR scripts differ. So, an extreme learning method is employed to recommend potential number of reducer a mapped task needs. Execution time is also predicted for user to better arrange their tasks. According to the results, VPCH can lead to load balancing and our prediction model can provide fast prediction than SVM with similar accuracy maintained.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Clustering Social Images with MapReduce and High Performance Collective Communication

Social Image clustering is a data intensive application that provides novel challenges to high performance computing. Already this field has reached 10-100 million images represented as points in a high dimensional (up to 2048) vector space that are to be divided into up to 1-10 million clusters. In recent years MapReduce has become popular in processing big data problems due to its attractive ...

متن کامل

SEISMIC DESIGN OPTIMIZATION OF STEEL STRUCTURES BY A SEQUENTIAL ECBO ALGORITHM

The objective of the present paper is to propose a sequential enhanced colliding bodies optimization (SECBO) algorithm for implementation of seismic optimization of steel braced frames in the framework of performance-based design (PBD). In order to achieve this purpose, the ECBO is sequentially employed in a multi-stage scheme where in each stage an initial population is generated based on the ...

متن کامل

Coefficient of Performance Optimization of a Single Stage Thermoelectric Cooler

In thermoelectric coolers (TECs) applied external voltage potential is generated to a temperature difference based on the Peltier effect. Main and basic structure of TECs is in the form of single stage device. Due to the low efficiency, especially low coefficient of performance (COP) of thermoelectric coolers, optimal design of geometrical parameters of such devices is vital. For this purpose, ...

متن کامل

Adaptive Preshuffling in Hadoop Clusters

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop–an open-source implementation of MapReduce is widely used for short jobs requiring low response time. In this paper, We proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications. Designing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016